deep learning accelerator
Gradient Estimation Methods of Approximate Multipliers for High-Accuracy Retraining of Deep Learning Models
Meng, Chang, Burleson, Wayne, De Micheli, Giovanni
--Approximate multipliers (AppMults) are widely used in deep learning accelerators to reduce their area, delay, and power consumption. However, AppMults introduce arithmetic errors into deep learning models, necessitating a retraining process to recover accuracy. A key step in retraining is computing the gradient of the AppMult, i.e., the partial derivative of the approximate product with respect to each input operand. Existing approaches typically estimate this gradient using that of the accurate multiplier (AccMult), which can lead to suboptimal retraining results. T o address this, we propose two methods to obtain more precise gradients of AppMults. The first, called LUT -2D, characterizes the AppMult gradient with 2-dimensional lookup tables (LUTs), providing fine-grained estimation and achieving the highest retraining accuracy. The second, called LUT -1D, is a compact and more efficient variant that stores gradient values in 1-dimensional LUTs, achieving comparable retraining accuracy with shorter runtime. Experimental results show that on CIF AR-10 with convolutional neural networks, our LUT -2D and LUT -1D methods improve retraining accuracy by 3.83% and 3.72% on average, respectively. On ImageNet with vision transformer models, our LUT -1D method improves retraining accuracy by 23.69% on average, compared to a state-of-the-art retraining framework. Modern artificial intelligence ( AI) technologies excel in a wide range of areas such as natural language processing and computer vision. However, this rapid growth raises serious concerns about power consumption [1]. To achieve energy-efficient deep learning accelerators, researchers have adopted an emerging design paradigm called approximate computing, which reduces power consumption at the cost of errors [2], [3]. Approximate computing is particularly suitable for deep learning accelerators, since they are inherently resilient to errors and noise.
- Europe (0.94)
- North America > United States > Massachusetts (0.46)
- North America > United States > Colorado (0.28)
Deep Learning Accelerator in Loop Reliability Evaluation for Autonomous Driving
The reliability of deep learning accelerators (DLAs) used in autonomous driving systems has significant impact on the system safety. However, the DLA reliability is usually evaluated with low-level metrics like mean square errors of the output which remains rather different from the high-level metrics like total distance traveled before failure in autonomous driving. As a result, the high-level reliability metrics evaluated at the post-silicon stage may still lead to DLA design revision and result in expensive reliable DLA design iterations targeting at autonomous driving. To address the problem, we proposed a DLA-in-loop reliability evaluation platform to enable system reliability evaluation at the early DLA design stage.
- Transportation > Ground > Road (1.00)
- Information Technology > Robotics & Automation (1.00)
- Automobiles & Trucks (1.00)
Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators
Johnson, Hans, Fang, Tianyang, Perez-Vicente, Alejandro, Saniie, Jafar
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications focused on exploring distributing scheduling optimizations for Deep Learning (DL) workloads to obtain the best performance regarding latency and power efficiency. Our cluster was modular throughout the experiment, and we have implementations that consist of up to 12 Zynq-7020 chip-based boards as well as 5 UltraScale+ MPSoC FPGA boards connected through an ethernet switch, and the cluster will evaluate configurable Deep Learning Accelerator (DLA) Versatile Tensor Accelerator (VTA). This adaptable distributed architecture is distinguished by its capacity to evaluate and manage neural network workloads in numerous configurations which enables users to conduct multiple experiments tailored to their specific application needs. The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the computation graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Finland > Pirkanmaa > Tampere (0.04)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
Learn about Deep Learning Accelerators on the Jetson Orin with NVIDIA
Developers or those of you interested in learning more about the Deep Learning Accelerator on NVIDIA's Jetson Orin mini PC will be pleased to know that NVIDIA has published a new article over on its technical blog providing an overview of the Deep Learning Accelerator (DLA) when used with the Jetson system that combines a CPU and GPU into a single module. Providing developers with an expansive NVIDIA software stack in a small, low-power package that can be deployed at the edge. "Though the DLA doesn't have as many supported layers as the GPU, it still supports a wide variety of layers used in many popular neural network architectures. In many instances, the layer support may cover the requirements of your model. For example, the NVIDIA TAO Toolkit includes a wide variety of pre-trained models that are supported by the DLA, ranging from object detection to action recognition. "While it's important to note that the DLA throughput is typically lower than that of the GPU, it is power-efficient and allows you to offload deep learning workloads, freeing the GPU for other tasks.
Intel continues AI surge with $2bn processor firm buyout - TechHQ
Intel has struck a US$2 billion deal for Israel-based AI firm Habana Labs, a developer of programmable deep learning accelerators for the data center. The acquisition will "turbocharge" Intel's offerings for the data center, it said in a release, "with a high-performance training processor family and a standards-based programming environment to address evolving AI workloads." Navin Shenoy, Executive Vice President and General Manager of the Data Platforms Group at Intel, said; "This acquisition advances our AI strategy, which is to provide customers with solutions to fit every performance need – from the intelligent edge to the data center." Habana will remain an independent business unit and will continue to be led by its current management team, reporting to Intel's Data Platforms Group, home to Intel's broad portfolio of data center-class AI technologies. Intel said it expects the AI chip market to exceed US$25 billion by 2024.
Deep Learning Accelerator, Platform & Server - ADLINK Technology
Artificial Intelligence (AI) has the ability to innovate and advance conventional practices and business operations. To bring AI to the edge, ADLINK takes a heterogeneous approach and offers a comprehensive solution portfolio of deep learning platforms and servers including acceleration engines, inference platforms, and training servers to infuse the power of AI into the smart manufacturing, smart city, logistics and warehousing, telecommunications applications and more.
Why Micron is Getting into the AI Accelerator Business
Micron has a habit of building interesting research prototypes that offer a vague hope of commercialization for the sheer purpose of learning how to make its own memory and storage subsystem approaches more tuned to next generation applications. We saw this a few years ago with the Automata processor, which was a neuromorphic inspired bit of hardware that focused on large-scale pattern recognition. That project has since folded internally and moved into a privately funded effort from a startup aiming to make it market ready, which is to say that it has all but disappeared from view since that was a couple of years ago. There is more here for anyone interested in the Automata architecture, but for those curious about why Micron wants to get into the accelerator business with one-off silicon projects like that or its newly announced deep learning accelerator (DLA) for inference, it's far less about commercial success than it is learning how to tune memory and storage systems for AI on custom accelerators. In fact, the market viability of such a chip would be a delightful bonus since the real value is getting a firsthand understanding of what deep learning applications need out of memory and storage subsystems.
Micron debuts flash memory-optimized AI development platform - SiliconANGLE
Computer chipmaker and storage company Micron Technology Inc. is pitching its hardware for artificial intelligence workloads after acquiring a startup called FWDNXT. The company announced the acquisition at its annual Micron Insight conference in San Francisco today, describing FWDNXT as a provider of AI hardware and software for deep learning, which is a subset of AI that tries to mimic the way the human brain solves problems. Micron's plan is to integrate FWDNXT's technology with its own, optimized flash memory products to create what it says will be a "comprehensive AI development platform." "FWDNXT is an architecture designed to create fast-time-to-market edge AI solutions through an extremely easy to use software framework with broad modeling support and flexibility," Micron Executive Vice President and Chief Business Officer Sumit Sadana said in a statement. "FWDNXT's five generations of machine learning inference engine development and neural network algorithms, combined with Micron's deep memory expertise, unlocks new power and performance capabilities to enable innovation for the most complex and demanding edge applications."
- North America > United States > California > San Francisco County > San Francisco (0.25)
- North America > United States > Oregon (0.05)
- Semiconductors & Electronics (0.90)
- Information Technology > Hardware (0.36)
- Information Technology > Services (0.31)
- Information Technology > Hardware > Memory (0.75)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.38)
TVM: End-to-End Optimization Stack for Deep Learning
Chen, Tianqi, Moreau, Thierry, Jiang, Ziheng, Shen, Haichen, Yan, Eddie, Wang, Leyuan, Hu, Yuwei, Ceze, Luis, Guestrin, Carlos, Krishnamurthy, Arvind
Scalable frameworks, such as TensorFlow, MXNet, Caffe, and PyTorch drive the current popularity and utility of deep learning. However, these frameworks are optimized for a narrow range of server-class GPUs and deploying workloads to other platforms such as mobile phones, embedded devices, and specialized accelerators (e.g., FPGAs, ASICs) requires laborious manual effort. We propose TVM, an end-to-end optimization stack that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. We discuss the optimization challenges specific to deep learning that TVM solves: high-level operator fusion, low-level memory reuse across threads, mapping to arbitrary hardware primitives, and memory latency hiding. Experimental results demonstrate that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art libraries for low-power CPU and server-class GPUs. We also demonstrate TVM's ability to target new hardware accelerator back-ends by targeting an FPGA-based generic deep learning accelerator. The compiler infrastructure is open sourced.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (2 more...)